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METH OD, COMPUTING ROUTINE, DEVICE FOR PREDICTING PROPERTIES OF 
MHC/PEPTIDE COMPLEXES, AND DATA AND PEPTIDES PRODUCED THEREFROM.^ 

FIELD OF THE INVENTION 

The present invention relates to a method for structure-based prediction of properties of 
peptides and peptide analogs in complex with major histocompatibility (MHC) class I and class 
II molecules. The said properties mainly relate to the three-dimensional structure of an 
MHC/peptide complex and the binding affinity of a peptide for an MHC receptor. The invention 
further relates to a computer program and a device therefor. The invention further relates to 
data produced by a method of the invention. The invention further relates to peptides and 
peptide analogs predicted to bind to target-MHC molecules. The present invention thus relates 
to the field of immunology, with possible applications in manufacture of vaccinates, de- 
immunization of proteins, and manufacture of therapeutic agents, especially immuno- 
therapeutic agents. 

BACKGROUND OF THE INVENTION 

Cytotoxic T-cells (Tc or CD8-T lymphocytes) and helper T-cells (Th or CD4-T 
lymphocytes) have the capability of recognizing short, processed fragments of a protein 
antigen, referred to as antigenic peptides or T-cell epitopes. However, recognition does not 
occur by direct binding to free peptides. Specific receptor molecules on T-cells (T-cell 
receptors or TCRs) recognize a peptide antigen only when it is bound to another receptor 
known as a major histocompatibility complex (MHC) molecule. Such MHC-peptide complexes 
serve the role of ceil markers: when the MHC contains an endogenous (self) peptide, It marks 
the cell as "healthy"; when it contains a foreign peptide, the cell is marked as "infected". The 
MHC-mediated presentation of antigenic peptides to the repertoire of T-cells can thus be seen 
as the primary stimulus to elicit an immune response. Depending on the type of MHC 
presenting an antigen, which is correlated with the type of cell expressing it, the immune 
system is triggered to either destroy the antigen presenting cell or to produce antibodies 
directed against the infectious agent. 

MHC molecules are subdivided into classes I and II. While their general function is the 

same (presenting antigen), they differ in a number of aspects. MHC class I is expressed on the 
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cell surface as a heterodlmeric complex between a 46-kDa heavy chain (the a-chain) and a 12 
kDa light chain (the p2-microglobulln or p2m chain). The a-chain consists of three domains, a,, 
a2 and azi the ai and a2 domains are responsible for binding of a peptide ligand. while the aa 
domain is membrane-bound and involved in CDS co-receptor binding. Class II MHC molecules 
have the same overall shape, although they are constituted of two membrane-bound chains: 
an a chain of -35 kDa and a p chain of -28 kDa. Both the a and the p chain fomi two domains 
(a, and az on the one hand and p, and Pz on the other). The ai and Pi domain jointly form the 
peptide binding domain. The Pz domain is involved in CD4 co-receptor binding. 

Both IV1HC class I and class II molecules show a high degree of polymorphism. They 
have been further subdivided into different subtypes. The existence of different MHC allotypes 
lies at the basis of the capacity of MHCs to bind a broad range of peptides while still presen/ing 
some specificity. Given this polymorphism, being able to predict which peptides specifically 
bind to which MHC subtypes, is thought to be of great value in vaccination strategies and de- 
immunization programs. Thanks to the recent burst of information derived from experimentally 
detennined 3D-structures. valuable insights about the determinants of peptide binding 
specificity have been obtained. This, in turn, has led to the idea that a structure-based 
prediction of potentially antigenic peptides (or T-cell epitopes) is within reach. 

Functional human leukocyte antigens (HLAs or human MHCs) are characterized by a 
deep binding groove to which endogenous as well as potentially antigenic peptides bind. The 
groove is further characterized by a well-defined shape and physico-chemical properties. HLA 
class I binding sites are closed, in that the peptide termini are pinned down into the ends of the 
groove. They are also involved in a network of hydrogen bonds with conserved HLA residues 
(Madden. D.R. et al., (1992) Cell 70, 1035-1048). In view of these restraints, the length of 
bound peptides is limited to 8-10 residues. Superposition of the structures of different HLA 
complexes confirmed a general mode of binding wherein peptides adopt a relatively linear, 
extended conformation. At the same time, a significant variability in the conformation of 
different peptides was observed also. This variability ranges from minor structural differences 
to notably different binding modes. Such variation is not unexpected in view of the fact that 
class I molecules can bind thousands of different peptides, varying in length (8-10 residues) 
and in amino acid sequence. The different class I allotypes bind peptides sharing one or two 
conserved amino acid residues at specific positions. These residues are referred to as anchor 
residues and are accommodated In complementary pockets (Falk. K. et al.. (1991) Nature 
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351, 290-296). Besides primary anchors, there are also secondary anchor residues occupied 
in more shallow pockets (Matsumura, M. a/., (1992) Science 257, 927-934). In total, six 
allele-specific pockets termed A-F have been characterized (Saper, M.A. et al., (1991) J. Mol. 
Biol. 219. 277-312; Latron, F. et al., (1992) Science 257, 964-967). The constitution of these 
pockets varies in accordance with the polymorphism of class I molecules, giving rise to both a 
high degree of specificity (limited cross reactivity) while preserving a broad binding capacity. 

In contrast to HLA class I binding sites, class II sites are open at both ends. This allows 
peptides to extend from the actual region of binding, thereby "hanging out" at both ends 
(Brown. J. et ai., (1993) Nature 364, 33-39). Class II HLAs can therefore bind peptide llgands 
of variable length, ranging from 9 to more than 25 amino acid residues. Similar to HLA class I, 
the affinity of a class it ligand is detemriined by a "constant' and a "variable" component. The 
constant part again results from a network of hydrogen bonds formed between conserved 
residues in the HLA class II groove and the main-chain of a bound peptide. However, this 
hydrogen bond pattern is not confined to the N- and C-terminal residues of the peptide but 
distributed over the whole of the chain. The latter is important because it restricts the 
conformation of complexed peptides to a strictly linear mode of binding. This Is common for all 
class II allotypes. The second component determining the binding affinity of a peptide is 
variable due to certain positions of polymorphism within class II binding sites. Different 
allotypes form different complementary pockets within the groove, thereby accounting for 
subtype-dependent selection of peptides, or specificity. Importantly, the constraints on the 
amino acid residues held within class II pockets are in general "softer' than for class I. There is 
much more cross reactivity of peptides among different HLA class II allotypes. Unlike for class 
I, it has been Impossible to identify highly conserved residue patterns in peptide llgands (so- 
called motifs) that correlate with the class II allotypes. 

The different characteristics of class I and class II MHC molecules are responsible for 
specific problems associated with the prediction of potential T-cell epitopes. As discussed 
before, class I molecules bind short peptides that exhibit well-defined residue type patterns. 
This has led to various prediction methods that are based on. experimentally determined 
statistical preferences for particular residue types at specific positions in the peptide. Although 
these methods work relatively well, uncertainties associated with non-conserved positions limit 
their accuracv Prpdirtinn mofhriHe for uur /»i 
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that is actually responsible for the binding. This fact, combined with the intrinsically weaker 
constraints of the complementary pockets in class II binding grooves, makes the establishment 
of (pseudo-) motifs very difficult (Mallios, R.R. (2001 ) Bioinformatics 1 7, 942-948). On the other 
hand, class II peptide binding motifs generally include more anchor residues than class 1 
motifs. 

Methods for MHC/peptide binding prediction can grossly be subdivided into two 
categories: "statistical methods" that are driven by experimentally obtained affinity data and 
"structure-related methods" that are based on available 3D structural infonnation of MHC 
molecules. 

Statistical methods have been promoted under the impulse of a growing amount of 
binding data. Sources of binding infomiation are, typically, elution and pool sequencing of 
peptides bound naturally to MHC molecules inside. cells (Falk, K. et al., {A994) Immuno- 
genetics 39. 230-242), phage display of peptide libraries (Hammer, J. et al., (1993) Cell 74, 
197-203. Fleckenstein, B. et al., (1999) Sem. Immunol. 11, 405-416), data sets compiled from 
reports in the literature (Brusic, V. etal.. (1998) Nucleic Acids Res. 26, 368-371. Rammensee, 
H.G. et al., (1999) Immunogenetics 50, 213-219). A common approach is to decompose, in a 
statistical way, the available experimental infomnaUon into MHC type-specific and peptide 
residue position-specific numerical values reflecting the preference for individual amino acid 
types at that position (Parker, K.C. et al.. (1994) J. Immunol. 152, 163-175). The matrices 
obtained in this way may then sen/e as profiles from which the binding affinity of a peptide 
sequence of interest can be estimated. 

Structure-based methods generally include a first step wherein the structure of a 
specific MHC/peptlde complex is modeled and a second step wherein the binding strength of 
the peptide is estimated from the modeled complex in accordance with an empirical scoring 
function. Examples include WO 98/59244, Altuvia, Y. etal.. (1995) J. Mol. Biol. 249, 244-250; 
Doytchinova, I.A. and Flower, D.R. (2001) J. Med. Chem. 44, 3572-3581). Altematively. a 
molecular dynamics simulation is sometimes perfomied to model a peptide within an MHC 
binding groove (Lim. J.S. et al. (1996) Mol. Immunol. 33, 221-230). Another approach is to 
combine loop modeling with simulated annealing (Rognan, D. etal.. (1999) J. Med. Chem. 42. 
4650-4658). Most research groups emphasize the importance of the scoring function used in 
the affinity prediction step. Schueler-Furman et al. (Schueler-Furman. O. et al., (2000) Prot. 
Sc/. 9. 1838-1864) apply a statistical potenUal to evaluate the contacts between the peptide 
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and the MHC receptor. Rognan et aL (1999) rely on a quantification of physicochemical effects 
(like H-bond formation, lipophilic contacts, desolvation, etc.). Swain et aL (Swain, MT., et aL, 
(2001) Proceedings of the second IEEE International Symposium on Bioinfomiatics and 
Biomedical Engineering. IEEE computer Society Press, Bethesda, Maryland, pp. 81-88) also 
apply a heuristic scoring function based on inter-atomic contacts, electrostatic interactions and 
H-bond formation. Doytchinova and Flower (2001 ) consider essentially the same contributions 
but follow a quantitative structure-affinity relationship (QSAR) method to assess the binding 
affinity. Logean et aL (Logean, A., et aL, (2001) Bioinorg. & Med. Chem. Letters 11. 675-679) 
have analyzed the performance of 7 universal scoring functions. They found that many of 
these scoring functions yield poor correlation with experiment, in contrast to their "Fresno" 
scoring function. However, it was also recognized that the Fresno function cannot be 
universally applied but requires recalibration for different protein-ligand systems. 

There is a need to substantially improve both the structure prediction and the affinity 
assessment steps of methods which predict the affinity of a peptide for a major 
histocompatibility (MHC) dass I or class II molecule. The main problem encountered in this 
field is the poor performance of prediction algorithms with respect to MHC alleles for which 
experimentally determined data (both binding and structural information) are scarce. It is an 
aim of the present invention to provide a novel method for predicting the affinity of a peptide for 
a major histocompatibility (MHC) dass I or dass II molecule, also in cases where experimental 
information is rare. 

SUMMARY OF THE INVENTION 

The present invention relates to a method for predicting the binding affinity of a peptide 
for a major histocompatibility (MHC) dass I or dass 11 molecule, comprising the following 
steps: 

(a) receiving a representation of a complete or partial three-dimensional structure 
of an MHC dass I or class II molecule, 

(b) obtaining an ensemble of representations of peptide backbone strudures of 
said peptide, said representations located within the binding site of said MHC molecule, 

(c) modeling for each peptide backbone structure of said ensemble in relation to 
said MHC molecule, at least the side-chains of said peptide, thereby obtaining an ensemble of 
modeled MHC/peptide complexes, and 
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(d) evaluating the binding properties of said peptide for said MHC molecule, 
comprising at least: 

(d1) evaluating one or more components of the potential energy of each 
complex of the ensemble of step (c). 

(d2) evaluating the conformational entropy for the complete ensemble of 

step. 

An accurate and efficient method is provided which uses a three-dimensional structure 
to predict the binding affinity of an MHC molecule/peptide complex. It fulfills a need for 
obtaining structural and physicochemical data on peptide MHC complexes, without ttie 
requirement for laboratory equipment, space, expertise and time. Furthermore, it provides the 
means to screen large numbers of potentially antigenic peptides and further provides the 
means for creating a database which may be examined for trends or which may be used as 
the basis for other experiments. 

A step which obtains an ensemble of backbone structures and a separate step which 
models the side-chains offer the advantages of sampling more efficiently the conformational 
space of the backbone, reducing the computational time required to model the side-chains, 
and provides a more accurate overall model of the complex(es). 

Combining potential energy and conformational entropy in the evaluation step leads to 
an improved accuracy in the prediction of the binding affinity. The present inventors have 
observed the surprising improvement In the con-elation between experimentally-determined 
and predicted binding affinities when both components are expiidtly computed. 

In one embodiment of the present invention the said representation of step (a) Is 
obtained from one of the following: 

- one or more experimentally determined stmctures obtained by, for example. X-ray 
crystallography, nuclear magnetic resonance spectroscopy, scanning microscopy, 
or, 

- one or more models derived from one or more experimentally determined 
structures, whereby said experimentally determined structures have a high 
sequence identity to said MHC molecule. 

The option to use experimentally-determined structures leads to a more accurate 
prediction of the affinity of the complex since the said stmctures have been experimentally 
validated and may have a higher degree of accuracy. The option to use computer-modeled 



wo 03/105058 



PCT/EP03/06049 



7 

Structures may allow the prediction of affinities of peptide for MHC molecules in complexes for 
which no or only partial MHC molecule structures exist. Since more MHC molecules are known 
than structures have been experimentally solved, the use of modeled structures allows the 
prediction of othenMse unobtainable complex affinity data, filling the growing need for such 
infomfiation. 

In another embodiment of the present invention the ensemble of step (b) is generated 
by a computer modeling method, said method being able to generate multiple energetically 
favorable backbone configurations in relation to the MHC molecule. The use of modeling to 
generate said ensemble allows the available conformational space to be sampled efficiently, 
foi- example in a fashion that Is specific for the sequence of said peptide. This provides 
validation for allowable conformations, and may also provide a more accurate assessment of 
properties of the complex. 

In another embodiment of the present invention the representation of step (b) is 
retrieved from a library of peptide stmctures pre-oriented in relation to the MHC molecule, the 
use of a library provides the opportunity of a drastic reduction of the computational time per 
peptide since an alternative is to use simulations which may be extremely demanding in 
computing time due to the large search space. An indirect advantage is the fact that the 
prediction accuracy can be Improved because a large number of pre-oriented peptide 
structures may be retrieved, and more attention can be paid to the important side-chain 
placement and affinity prediction steps. 

In yet another embodiment of the present invention a complex within said ensemble of 
step (c) is obtained from a side-chain placement algorithm. The use of a side placement 
algorithm decouples the side-chain from the main-chain sampling so providing an opportunity 
to increase the speed and accuracy of the calculation. 

In yet another embodiment of the present Invention the side-chain placement of step (c) 
nqt only involves placing the side-chains of the peptide itself, but also involves placing one or 
more side-chains of said MHC molecule that are In contact with said peptide. The use of both a 
si^e-chaln placement for peptide and MHC molecules provides the opportunity to generate 
more accurate models and hence to increase the accuracy of the predicted affinity of the 
complex. 

In yet another embodiment of the present invention a complex within said ensemble of 
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optimization. The globally optimal placement of side-chains generally yields more accurate 
predictions compared to local optimization. 

In yet another embodiment of the present invention the side-chain placement algorithm 
of a method above comprises a dead-end elimination (DEE) algorithm, characterized in that 
5 said DEE algorithm eliminates rotameric conformations on the basis of a mathematical 
criterion that allows the detection of conformations that are not compatible with the globally 
optimal confomiation. The DEE approach is helpful in solving the combinatorial search 
problem by reducing the number of possible rotamers which need to be tested, thereby greatly 
increasing the speed of global side-chain optimization. 

10 In yet another embodiment of the present invention the side-chain placement algorithm 

of a method above comprises a FASTER algorithm (Desmet J. et al. (2002) Proteins 48, 31- 
.43), said algorithm being characterized essentially by a repeated perturbation, relaxation and 
evaluation step. The FASTER algoritiim improves the side-chain prediction accuracy at a low 
computational cost, and hence makes provision for more accurate predictions of binding 

15 affinity. 

In yet another embodiment of the present invention ttie binding affinity of step (d) of a 
metiiod above is represented by a single scoring value for the whole ensemble of 
MHC/peptide complexes, said scoring value comprising the sum of the conformational entropy 
for the complete ensemble of MHC/peptide complexes, and the average of the said energetical 

20 components of each of the complexes of said ensemble. Conformational entropy is a 
fundamental property of a complex that is preferably computed from an ensemble of 
structures. The explicit inclusion of confonnational entropy contributes in a favorable way to 
the correlation between predicted and experimental affinities. Furthermore, the incorporation of 
significant energetic components, in combination with an enfa-opical component, allows a more 

25 accurate assessment of the affinity of ttie complex. 

In yet another embodiment of the present invention the binding properties of step (d) of 
a method above are evaluated for the global complex, thereby accounting for interactions 
between pairs of residues from the peptide, the MHC molecule and both the peptide and the 
MHO molecule. The use of global scoring which accounts for interactions between said pairs of 
30 residues provides a more accurate assessment of the global energy of the system and hence 
provides a more exact measure of the affinity of the complex. 
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In yet another embodiment of the present invention the entropical component of a 
method alxDve reflects the overall conformational flexibility of the peptide. Confomiational 
flexibility is a fundamental property of complexes that is non-trivial to simulate or quantify. 
Furthemnore. the simulation and ijuantification of confomiational flexibility may provide useful 
insights. 

In yet another embodiment of the present invention the representations of said peptide 
contained in said library of a method, above are derived from experimentally determined 
structures. The presence of experimentally-detemiined structures in the library provides the 
option to use structures which have been experimentally validated. Said staictures may have a 
higher degree of accuracy and consequently lead to a more accurate prediction of the affinity 
of the complex. 

. In yet another emboditiient of the present inventipn.the representations of said peptide 
contained in said library of a method above are derived from coiriputer-generated structures, 
said structures generated by said computer modeling method described above. The presence 
of computer-modeled structures In the library may allow the prediction of peptide affinities for 
MHC molecules in complexes for which no or only partial structural information is available. 
Since only few complex structures have been experimentally solved, the use of modeled 
structures allows structure-based affinity prediction for complexes of unknown structure, filling 
the growing need for such information. 

In yet another embodiment of the present invention said peptide of a method above 
comprises one or more non-naturally occurring amino acids. The use of non-naturally 
oocuning amino acids provides the possibility for obtaining affinity data for compounds In 
which the feature provides additional properties, for example a therapeutic property, increased 
in vivo stability, increased intrinsic activity, reduced toxidty. 

In yet another embodiment the invention relates to a method for producing an 
Immunogenic peptide comprising an MHC class I or class II restricted T cell epitope that binds 
to an MHC class I or class II molecule and induces an MHC class I or II -restricted cytotoxic T 
cell response, said method comprising steps of: 

(a) providing an amino acid sequence of a polypeptide of interest; 

(b) preparing one or more overiapping putative immunogenic peptide fragments of said 
polypeptide of interest, for instance consisting of 8 to 20 amino acids; 
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(c) receiving a representation of a complete or partial three-dimensional structure of 
said MHC class I or class il molecule, 

(d) obtaining an ensemble of representations of peptide backbone stmctures of said 
putative immunogenic peptides, said representations located within the binding site of 
said MHC molecule, 

(e) modeling for said peptide backbone structures of said ensemble in relation to said 
MHC molecule, at leiast the side-chains of said putative immunogenic peptide, thereby 
obtaining an ensemble of modeled MHC/peptide complexes, 

(f) evaluating the binding properties of said putative immunogenic peptides for said 
MHC molecule, comprising at least: 

f1) evaluating one or more components of the potential energy of each complex 
of the ensemble, 

f2) evaluating the conformational entropy for the complete ensemble" of each 
MHC/peptide complex, 

(g) inferring from the results obtained in (f). one or more putative immunogenic peptides 
that bind to said MHC molecule, 

(h) optionally preparing one or more of said putative imniunogenic peptides of said 
polypeptide of interest, 

(i) optionally testing complexes of said one or more putative immunogenic peptides 
said MHC molecule for an ability to be recognized by a MHC cytotoxic T cells, and to 
thereby induce a cytotoxic T cell response to the epitope, and 

(e) (optionally) selecting said one or more putative immunogenic fragments comprising 
an MHC class I or class II binding site that induce an MHC class I or class II cytotoxic 
T cell response to the epitope. 

In a preferred embodiment, the one or more overiapping putative immunogenic peptide 
fragments of said polypeptide of interest consist of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
1 8, 1 9, 20, 21 , 22, 23, 24, 25 or more amino acids. 

In a further embodiment of the present invention said representation of step (c) is 
obtained from one of the following: 

- one or more experimentally determined structures obtained by for example X-ray 
crystallography, nuclear magnetic resonance spectroscopy, scanning microscopy, 
or 
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- one or more models derived from an experimentally determined structure, whereby 
said experimentally determined structure has a high sequence identity to said MHC 
molecule. 

In a further embodiment of the present invention said representation of step (d) is 
generated by a computer modeling method, said method being able to generate multiple 
energetically favorable backbone configurations in relation to said MHC molecule. 

In a further embodiment of the present invention said representation of step (d) is 
retrieved from a library of peptide structures pre-oriented in relation to said MHC molecule. 

In a still further embodiment of the present invention a complex within said 
ensemble of step (e) is obtained from a side-chain placement algorithm. 

In a further embodiment of the present invention the side-chain placement of step (e) 
not only involves placing the side-chains of the peptide itself, but also involves placing at least 
one side-chain of said MHC molecule that are in contact with said peptide. 

In another embodiment of the present invention a complex within said ensemble of step 
(e) is obtained from a side-chain placement algorithm suited for global side-chain optimization. 

In a further embodiment of the present invention the side-chain placement algorithm is 
a dead-end elimination (DEE) algorithm, characterized in that said DEE algorithm eliminates 
rotameric conformations on the basis of a mathematical criterion that allows the detection of 
conformations that are not compatible with the globally optimal conformation. 

In a further embodiment of the present invention the side-chain placement algorithm is 
a FASTER algorithm, said algorithm being characterized by a repeated perturbation, 
relaxation and evaluation step. 

In a further embodiment of the present invention the binding affinity of step (f) is 
represented by a single scoring value for the whole ensemble of MHC/peptide complexes, said 
scoring value comprising the sum of the conformational entropy for the complete ensemble of 
MHC/peptide complexes, and the average of the said energetical components of each of the 
complexes of said ensemble. 

In a further embodiment of the invention the binding affinity of step (f) is evaluated for 
the global complex, thereby accounting for interactions between pairs of residues from the 
peptide, the MHC molecule and both the peptide and the MHC molecule. 
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In a further embodiment of the Invention the entropical component reflects the overall 
confomiational flexibility of the peptide. 

In a further embodiment of the invention wherein the representations of said peptide 
contained In said library are derived from experimentally determined structures. 

In a further embodiment of the present Invention the representations of said peptide 
contained In said library are derived from computer-generated structures, said structures 
generated by said computer modeling method of claim 1 8. 

In a still further embodiment of the present invention said peptide comprises one or 
more non-naturally occurring amino acids. 

In yet another embodiment the present invention relates to any method herein 
described wherein said MHC class I molecule comprises an HLA antigen selected from any of 
the HLA-A. HLA-B, HLA^C, HLA^E, HLA-F and HLA-G genes or gene products or a gene 
product from any of the alleles of these genes. 

In yet another embodiment the present Invention relates to any method herein 
described wherein said MHC class II molecule comprises an HLA antigen selected from any of 
the HLA-DR, HLA-DQ and HIj^-DP genes gene products or a gene product from any of the 
alleles of these genes. Some non-limiting examples HLA alleles can be found for instance on 
the following web address: http://www.anUionynolan.com/HIG/lists/class1list.htmL 

A further embodiment of the present Invention is data comprising: 

- representations of one or more peptide backbone structures, each peptide 
demonstrating an interaction with an MHC class I or class II molecule, and 

- an indication of the MHC molecule associated with said representation. 

Data comprising Information about MHC molecules, peptides, and complexes of both 
provide a source for data-mining, of, for exannple, therapeutically useful peptides. Stmctural 
information, represented as data, obviates the need to model said structures using methods 
known in the art, so providing a significant time- and hence cost-saving. 

A further embodiment of the present Invention Is a computer program comprising 
computing routines, stored on a computer readable medium for evaluating the binding affinity 
of a peptide for an MHC class I or class II molecule, said routines comprising: 

- receiving an ensemble of representations of stmctures of the complex between said 
MHC molecule and said peptide. 
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- evaluating the potential energy of each complex of the ensemble, 

- evaluating the conformational entropy for the complete ensemble. 

A computer routine for evaluating the binding affinity of a peptide for an MHC molecule 
provides the advantage of speed and allows for the integration with other routines. By 
integrating the routine, the possibility exists, for example, for automation, efficient transfer of 
data and the provision of tools for the interpretation of data. 

Another embodiment of the present invention is a computer program as described, 
above, further comprising modeling for each peptide backbone structure of said ensemble in 
relation to said MHC molecule, at least the side-chains of said peptide. 

Another embodiment of the present invention is a computer program as described 
above, wherein said peptide backbone structures are obtained by computer modeling or by 
retrieval from a database. ' ~ ~" ~ 

An embodiment of the present invention is a device for evaluating the binding affinity of 
a peptide for an MHC class I or class II molecule, comprising: 

- receiving an ensemble of representations of structures of the complex between said 
MHC molecule and said peptide, 

- evaluating the potential energy of each complex of the ensemble, 

- evaluating the conformational entropy for the complete ensemble. 

A device which performs a method of the present invention, alleviates the user from the 
task of performing the said method, so offering a time- and cost- saving. 

A further embodiment of the present invention is an (unknown) peptide which binds 
MHC class I or class II molecules, said peptide being obtainable by using a method above. 

: A further embodiment of the present invention is an (unknown) peptide which binds 
MHC dass I or class II molecules, said peptide being obtained by using a method above. 

Another embodiment of the present invention is a nucleic add (capable of) encoding a 
peptide as defined above. 

Another embodiment of the present invention is a nucleic acid of at least 1 5 nucleotides 
in length (capable of) specifically hybridizing with the nucleic add as defined above. 

Another embodiment of the present invention is an antibody specifically recognizing a 
peptide as defined above. 
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Yet another embodiment of the present invention is an antibody spedfically recognizing 
a nucleic acid as defined above. 

Yet another embodiment of the present invention is a method for producing a peptide 
as defined above comprising: 

(i) culturing host cells comprising a nucleic acid according as defined above, under 
conditions allowing the expression of the peptide, and, 

(ii) recovering the produced peptide from the culture. 

Yet another embodiment of the present invention is a peptide as defined above for use 
as a medicament. 



Yet another embodiment of the present Invention is a nucleic acid as defined above for 
use as a medicament. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to a method for structure-based prediction of the affinity 
of potentially antigenic peptides for major histocompatibility (MHC) receptors. More specifically, 
a method to provide a quantitative assessment of the affinity of a selected peptide sequence 
for a selected MHC allotype through (i) analysis of the three-dimensional structure of an MHC 
peptide binding domain, (ii) by generating multiple confomriations for the backbone of the 
selected peptide, (iii) by optimizing the side-chain confomiation for each MHC/peptide main- 
chain structure, and (iv) by computing the expected binding affinity of the MHC/peptide 
complex, thereby Including a conformational entropy component derived from the set of 
generated conformations. The application of this method to multiple peptides and/or multiple 
MHC receptor types may be helpful to identify the most antigenic peptides originating from a 
common source, for example from a specific viral or bacterial species or a therapeutic protein 
molecule. This, in ttim, may be useful in vaccination or de-immunization applications. 

In one embodiment of the present invention, a first step comprises receiving an 
experimentally determined three-dimensional {3D) structure for a selected MHC class I or 
class II allotype is retrieved. If a suitable 3D structure is not available, it is modeled by 
homology to a known structure which preferably has a maximal amino acid sequence identity 
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with the selected MHC allotype. The retrieved or modeled structure consists, at least, of those 
amino acid residues forming the peptide binding site. 

In a second step, multiple conformations for the main-chain of the selected peptide are 
generated, either by retrieval from an MHC/peptide main-chain library or by a suitable 
computer modeling algorithm, preferably a docking algorithm. The said library may be a 
compilation of experimentally determined structures or structures generated in advance by a 
suitable computer modeling algorithm, preferably a docking algorithm. 

In a third step, for each peptide main-chain conformation generated in the second step, 
the conformation of side-chains of the selected peptide are modeled by applying a suitable 
side-chain placement algorithm, preferably a FASTER or a DEE method, in conjunction with a 
first energy-based scoring function, preferably a potential or free energy function. The co- 
modeling of the MHC receptor structure with that of the peptide is- a preferred option. -The 
result of this third step is a set of full complex structures at atomic level of detail. 

In a fourth step, the ensemble of modeled structures obtained in the third step is 
evaluated in accordance with a second scoring function hereinafter called the "affinity scoring 
funcBon". The latter is suited especially to evaluate the binding affinity of a peptide ligand to a 
receptor. The affinity scoring function preferably includes components related to the 
conformational energy, the effect of solvent, and parametrized amino acid type-based terms. 
An essential component of the affinity function Is the incorporation of an entropical contribution, 
preferably derived in accordance with statistical mechanical laws and applied to tiie complete 
ensemble of modeled structures, as generated in the third step. The explicit generation of 
structural ensembles is intended to accouht for, essentially, the conformational freedom (or 
flexibility, micro-states, entropy etc.) of the connplex. 

^ A method of the present invention concerns the quantitative prediction of the binding 
affinity of a given peptide for a given MHC allotype. A method might be applied to multiple 
peptides and/or multiple receptors by repeated application of the basic meOiod for a single 
peptide/receptor system. 

In one embodiment of the invention, the considered MHC molecules are of any class, 
preferably of class I and class II. 

In another embodiment of the present invention, there are no limitations to the amino 
acid composition or the length of the simulated peptide. In another embodiment, the length of 
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Simulated class l-binding peptides is less than 30 residues, preferably less than 20 and more 
preferably between 8 to 10 residues. In another embodiment, the length of dass II simulated 
peptides is less than 30 residues, preferably less than 20 and more preferably restricted to 
nonapeptides (9-residue peptides) in view of the experimental evidence that fragments of this 
length forni the region of contact with the receptor binding groove. 

A method of the present invention relates to the quantitative prediction of affinity 
values. Properties that are directly related with binding affinity comprise binding free energy, 
association/dissociation constants and ICso values. The prediction of these values also forms 
part of the invention. Properties that are Indirectly related with binding affinity comprise, for 
example, association/dissociation rates (on/off rates), immunogenldty and conformational 
flexibility. An aspect of the present invention may be a method for (^diction of kinetic and 
immunogenic properties. Another aspect of the present Invention may be a method for 
simulation and quantification of conformational flexibility. 

A method of the present Invention provides a novel approach to structure-based 
prediction of MHC/peptide affinities, comprising a quantitative assessment of the affinity of a 
selected peptide sequence for a selected I^HC allotype through four computational steps. 

The first three steps relate to the prediction of multiple 3D structures for the selected 
MHC/peptide complex by gradually adding levels of detail in the consecutive modeling steps. 
The fourth step analyzes structural Infomtiatlon and applies a specific scoring function in order 
to translate the structural infdmnation into a predicted peptide binding affinity. A method of the 
present invention comprises steps 1 to 4. sumnnarized as follows (see also FIGURE 1). 

1. MHC template consbv<A'on. A suitable 3D model for the selected MHC allotype Is 
generated, either by retrieval from the Protein Databank (PDB) or by a standard homology 
modeling method. This model serves as an input template structure for the next steps. The 
model is devoid of any peptide structure, i.e. the binding groove is "emptied'. For the purpose 
of this section only, the model is refened to as "MHO". 

2. MHC/peptide m^n-chain construction. The MHC template structure from step 1 is 
complemented with an ensemble of peptide backbone (i.e. main-chain) conformations. This 
leads to an ensemble of 3D structures consisting of a structurally constant part, MHC. and a 
variety of peptide main-chain stmctures. For the purpose of this section only, the said 
ensemble Is named "{pm^". The union of I^HC and the multiple representations of peptide 
backbones is denoted as "{/WHC/p„^" In this description. The latter set of structures may be 
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generated, for example, by a suitable computer modeling algorithm that yields multiple 
energetically feasible peptide backbone configurations in relation to MHC, called, for the 
purpose of this description, a "docking approach". In another example, the set of structures 
may be generated by a method which retrieves pre-oriented peptide structures from a library, 
said method called the "database approach" for the purpose of this description. Both 
approaches are discussed in detail below. 

3. MHC/full peptide construction. A third step concerns the addition and modeling of 
side-chains. In accordance with the amino acid sequence of the selected peptide, each residue 
position of p„„ in each structure of the set {MHC/p^ is provided with the con-ect side-chain. In 
the event that the correct side-chains are already present (for example, if step 2 was 
performed by docking of the same peptide), the mutation step may be skipped. More important 
is the modeling of each MHC/p^. In one embodiment of the present inverition, this is 
accomplished by a suitable side-chain placement algorithm such as a FASTER or a DEE 
method. The modeling of side-chains may not necessarily be limited to those of the peptide; 
one aspect of the invention is to include in this step a number MHC side-chains as well. Even if 
step 2 was performed by a docking method, the invention allows for the re-modeling of at least 
all receptor side-chains In contact with the peptide, in addition to the side-chains of the peptide 
itself. Thus, step 3 of a method of the present Invention delivers an ensemble of full complex 
structures at atomic detail, denoted as {MHC/pu,^ for the purposes of this description, wherein 
the side-chain conformations are optimally adapted to each p„^ stnjcture in relation to MHC. 

4. MHC/peptlde affinity assessment. One aim of step 4 is to compute a single 
sojring value reflecting the binding affinity of the selected peptide for the selected MHC 
allotype. A source of input data is the structural information obtained In step 3. The final score 
ofjthe considered system Is obtained by applying a function called the affinity scoring function, 
F, for the purpose of ttie present description, which has been optimized so as to correlate with 
the true tfiermodynamic free energy of binding. As explained further below, this function 
comprises preferably components related to the conformational energy, the effect of the 
solvent, and specific amino acid type-based terms ttiat have been parametrized. These types 
of .contributions are not ensemble properties, i.e. tiiey are computed for each individual 
sbucture of the set {MHC/p,^. Yet. working , with multiple sto-uctures, or ensembles, enables 
certain structure-derived contributions to be averaged, thereby reducing the noise level. 
Processing these contiibutions leads to a first component of the predicted affinity under the 
form of an average energy component for ttie whole ensemble, tenned <E> for the purpose of 
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the present description. Another essential component of F is the entropical contribution 
(temied S for the purpose of the present invention), derived in accordance with statistical 
mechanical rules and accounted for by an equation: 

F = <E>-cS j^j 

In equation [1], c Is a parametrized constant which theoretically conesponds with the 
absolute temperature (in degrees Kelvin) at which the MHC/peptide system is simulated. The 
entropy contribution S is preferably taken to be the logarithm of the number of energetically 
acceptable structures within the set {MHC/pM^. Clearly, S is an ensemble property reflecting 
the overall confomnational flexibility of the selected peptide in the complex. It is also noteworthy 
that the more negative <E> and the more positive S, the \o\Ner will be F, thus the higher will be 
the predicted affinity, in agreement with thermodynamic principles. 

'n step 2 of the invention - obtaining -an ensemble of multiple confonnations for 

the main-chain of the peptide located In the target-MHC binding site - two means for 
generating said ensembles are suggested as examples: 

(A) A basic method, also referred to as the "docking approach", wherein peptkJe main- 
chain conformations or "binding modes' are generated via molecular modeling, preferably 
peptide docking. 

(B) An advanced method, also referred to as the "database method", wherein peptide 
main-chain conformations are retrieved from a database of structures. 

An underiying hypottiesis of the database method might be explained by the following: 
peptides can assume only a limited number of binding modes, inrespective of their amino add 
sequence. Assuming the validity of this hypothesis, this means that different Independently 
peri'ormed docking experiments of peptides varying in sequence (but not in length) are likely to 
show some partial overiap between the generated ensembles. In a more formal notation tills 
corresponds to the situation wherein - 

{MHC/p^r\iMHC/p'^^0 ^2] 

The merging of a sufficient number of ensembles resulting from independent docking 
experiments with different peptide sequences may therefore lead to the establishing of a 
generalized ensemble of possible MHC/p^ structures, hereby denoted as {MHC/P^. The 
exad amino acid sequence of eadn peptide in this ensemble then becomes irrelevant (in view 
of the structural overiap between the constituting populations). In other wonls, tiie set 
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{MHC/P^ might be seen as the stmcture MHC provided with a variety of pure peptide 
backbone conformations, or "poly-alanine" peptide conformations. 

An aspect of the present Invention in which peptide main-chain conformations are 
retrieved from a library has advantages over other methods. One advantage is of course a 
drastic reduction of the computational time per peptide. Docking simulations are often 
extremely demanding in computing time because of the huge search space. (The latter 
consists of three translatlonal, three rotational and a large number of confomiational degrees 
of freedom, making up a total space with very high dimension.) An indirect advantage Is the 
fact that the prediction accuracy can be improved because more attention can be paid to the 
Important side-chain placement and affinity prediction steps. Finally, for various technical 
reasons some peptide binding modes may be missed in a docking experiment, whereas they 
T^pr^sentecl In the generalized ensemible, on^ conditi^^^^ the latter cpyere the 
full accessible space. 

An ensemble {MHC/P^ only depends on two variables: MHC allotype and peptide 
length. Any sequence information may be suppressed in view of the scope of any such 
ensemble: representing peptide main-chain binding modes. In one embodiment of the present 
Invention, MHC/P^c structures are preferably stored in a format wherein the peptides are 
converted into poly-alanine fragments. In another embodiment, a generic database may be 
compiled from different MHC allotype-speclfic and peptide length-specific structural libraries. 

Such a database may be used, for example, to predict affinities for peptides of different 
length or to predict the affinity of a given peptide for different MHC types. 

Detailed steps of a method of the present invention comprise the following: 

1. Construction of an MHC template. A method of the present invention requires two 
basic elements of input data, besides a number of execution parameters {see FIGURE 2 for a 
schematic ovennew of the complete method). The first element Is the selection of an MHC 
allotype of interest, the second one is the sequence of a peptide as present in a protein sounse 
C|f interest, for example a viral protein. Selecting an MHC allotype is equivalent to selecting the 
amino acid sequence representing the MHC allele. With this sequence (or a reference to it) It Is 
possible to search the protein data bank (PDB) for the presence of 3D structures sharing the 
same amino add sequence. If such structure exists, it can be retrieved from the PDB (Berman, 
H.M. et al., (2000) Nudeic Acids Res. 28. 235-242) and used as a three-dimensional MHC 
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template structure in the further prediction steps. In the event that more than one candidate 
structure is available, the user has to decide which one is the most preferred starting staicture. 
Useful criteria for this purpose are the crystallographic resolution and refinement, the absence 
of missing atoms, and/or the criteria applied by structure validation tools such as the Biotech 
Validation Suite (www.embl-heldelberg.de, and follow links therein for the Biotech Validation 
Suite). 

In the case that neither the PDB database nor available publications describe the 
stmctural co-ordinates of a sequence identical to that of the selected MHC allotype, a template 
stmcture may be constructed by homology modeling. Various methods for homology modeling 
Include, for example Swiss-Model (Guex, N. and Peitsch. M.C. (1997) Electrophoresis 18. 
2714-2723. 1997) or SCWRL (Bower, M. et al.. (1997) J. Mol. Biol. 267. 1268-1282). Because 
the modeling of MHC binding grooves involves do Insertions or deletions, a pure side-chain- 
placement algorithm can be applied. A prefen-ed method to accomplish this is a DEE method 
(De Maeyer et al., 2000) or the FASTER method as described by Desmet et al. (Desmet. J. ef 
al.. (2002) Proteins 48. 31-43). Once a template structure has been retrieved or modeled. It is 
within the scope of the present invention to refine it by perf^orming 100-200 steps of steepest 
descent energy minimization, or by any equivalent energy minimization procedure. Such 
energy minimization action is a standard procedure In protein modeling and serves to solve 
potential atomic conflicts or suboptimal positioning. 

In one embodiment of the Invention, a nnethod which is followed by a user in advanced 
execution mode /.e. the database approach, merely Involves the selection of the appropriate 
{MHC/P^} ensemble from the database, said ensemble corresponding with the MHC allotype 
of interest. In this case the MHC template constmction step may not be explicitly executed but 
is implicitly present in the structure retrieved from the database. 

2. MHC/peptide main-chain construction. One step of the present method is the 
constmction of an ensemble of peptide main-chain configurations {p^ in relation to the MHC 
template, or {MHC/p^. The selected peptide p is characterized by a well-defined amino acid 
sequence. It is logical to assume that the sequence of p has at least some influence on the 
ensemble of binding modes or, in other words, that {MH(yp^ is sequence-specific. On the 
other hand, the very nature of MHC class I and class II binding grooves also suggests that the 
number of distinct binding modes is limited. Therefore, the constmction of peptide backbones 
might be performed in more than one way. For example a sequence-specific {MHC/p^ 
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ensemble Is created for each new peptide. Or in another example a generalized ensemble 
{MHC/Pmc} might be made available, representing at least the conformational space of the 
selected peptide p. An over-representation of the space is not so much of a problem because 
the generalized ensemble {MHC/P^ may be reduced to the peptide-specific ensemble 
{/WHC/JOmc} in step 3 of a method wherein iWHOincompatible binding modes are identified after 
side-chain placement. Furthermore, the establishing of a generalized ensemble can be 
accomplished in a straightforward manner by unifying diiTerent peptide-specific ensembles until 
a sufficient overlap between the populations is observed. Consequently, step 2 of a method of 
the present invention reduces to the problem of generating peptide-spedfic {MHC/p^ 
ensembles. 

An example of a method of constructing the peptide backbone Is found in Desmet etaL 
(1997, 2000). This docking method is a combinatorial algorithm fprflexible docking of peptides 
to the binding site on a protein receptor molecule in which the peptide is constructed ifrom 
scratch in relation to the chosen receptor structure, thereby avoiding any potential bias from a 
starting structure of the receptor/peptide complex. It yields a collection of different, 
energetically favorable complex structures wherein the peptide assumes, typically, between 0 
and 500 distinct binding states. This de novo peptide building method is therefore the most 
preferred approach to generate the contemplated {MHC/p^^ ensembles. The method of 
Desmet et aL (1997, 2000) is herein explicitly incorporated by reference. Its essential 
execution steps and characteristics are outlined in the following. 

The docking method referred to above consists of a combinatorial buildup algorithm 
that "grows" the peptide by gradual addition of a single residue adopting a specific main-chain 
cpnformation- For each residue type there are 47 low energy main-chain rotamers and for 
each main-chain rotamer there are a variable number of backbone-compatible side-chain 
rotamers. Glycine, proline and N- or C-terminal residues form an exception and have 125, 35 
and 12 main-chain rotamers, respectively. The rotamer library thus represents the entire 
conformational space for each residue type. 

The docking algorithm starts from a peptide fragment of length one, /.e. a user-selected 
root residue. (This can be any residue of the peptide.) The accessible space for the root 
residue is searched by a combined translational, rotational and conformational exploration. 
Translations and rotations are performed in a discretized fashion in accordance with a grid 
approach. The conformational sampling is done separately for the main-chain and side-chain 
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parts of the system. The main-chain conformation is only varied for the peptide, whereas that 
of the receptor is strictly kept fixed. Possible main-chain confonmatlons for the peptide, in this 
case the root residue, are selected from the main-chain rotamer library (containing mostly 47 
rotamers per residue type). Possible side-chain conformations are retrieved from a backbone- 
dependent side-chain rotamer library. Besides the side-chain of the peptide's root residue, up 
to about 40 side-chains from the receptor can be modeled simultaneously. The side-chain 
placement step is fully repeated for every translational-rotational-(backbone)-rotameric 
combination of the root residue, one such step called a single docking step. The side-chain 
placement itself is performed by a standard DEE method (Desmet et aL, 1992). The net result 
of each docking step is an energetical value. Ewnd, reflecting the "quality of fir of the peptide's 
root residue in the considered binding mode. Ewnd is computed by a rich function, including the 
interaction energy between the peptide (root) fragment and the receptor, the total fragment 
self-energy and the augmentation of the receptor self-energy due to conformational changes 
induced by the presence of the fragment. This value serves as a discriminator between 
energetically acceptable and prohibited binding modes (applying a user-defined threshold 
value). All energetically acceptable single-residue fragments are added to a peptide fragment 
repository. 

The buildup of the peptide continues by combining each previously accepted fragment 
in the repository with the available main-chain rotamers of an adjacent residue. Each new 
combination is again processed individually by the DEE-based side-chain placement algorithm. 
All energetically favorable fragments are added to the pepOde fragment repository. This 
buildup process continues until all residues of the peptide have been extended to their full 
length. Thus, in the end the peptide fragment repository contains only energetically acceptable 
full-length peptides. 

One aspect of a fragment repository is that it may hold only Information related to the 
binding mode of the peptide's main-chain; reference to a specific conformation for the skJe- 
chains may not be stored. 

One embodiment of the present invention is the storage of modes identified by the 
docking method into a general database of {MHC/P^ ensembles. In view of the usage of 
such database in providing a generic source of binding modes (i.e. when applying the 
advanced database-related operation mode of a method of the present invention), the peptide 
conformations are preferably stored as poly-alanine or poly-glydne constructs. The only form 
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of specificity in the database concerns tine MHC allotype and length of the generic peptide 
fragments. 

3. MHC/full peptide construction Step 3 of a method of the present invention involves 
the reconstruction of peptide and optionally the receptor side-chain conformations in order to 
build full complex structures. This structural information fomis the main source of input 
information for the next step 4 of the present method. 

In view of the fact that the present invention is almost exclusively based on properties 
derived from predicted structures, the accuracy of this step is directly related to the prediction 
accuracy of the peptide binding affinity, i.e. an important aim of the present invention. 

The accuracy of any side-chain placement method may be determined by three 
aspects: (i) the search method that is used to determine the optimal global side-chain 
arrangement, (ii) the rotamer library from where potential side-chain conformations are 
retrieved, and (iii) the quality of the scoring function used during conformational search. A 
fourth determinant of accuracy, i.e. the coupling between main-chain and side-chain 
conformational changes, is also considered. It may be implicitly calculated from the above 
because side-chain confomnations are generated for a broad ensemble of peptide main-chain 
structures. The first three determinants of prediction accuracy are discussed in more detail. 

1. Preferred side-chain conformational search method. The present inventors have 
recently developed a novel method for fast and accurate side-chain modeling called the "fast 
and accurate side-chain topology and energy refinement method" or FASTER method (Desmet 
et al., 2002). In view of its characteristics, the FASTER method is highly preferred to perform 
step 3 of the present method. The main reason for this is that FASTER allows a rapid yet 
accurate search for the globally optimal side-chain arrangement, which is one of the key- 
aspects of the present invention. More specifically, for each MHC/Pmo structure of the 
ensemble generated in step 2, all side-chains of the peptide and a significant number of side- 
chains from the MHC receptor (typically 10-30) are modeled simultaneously in order to find the 
gtoba//y best packing an-angement. In doing so, all possible pair-wise interactions between two 
flexibly treated side-chains are taken into account during the modeling. This is in contrast to 
other methods (e.g. Swain et al., 2001) which only score the side-chain conformations of the 
peptide and which independently do this for each side-chajn. 

Apart from the FASTER method, other side-chain placement methods are suitable for 
performing step 3 of the present invention, such as DEE (De Maeyer et aL, 2000), self- 
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consistent mean field optimization (Koehl, P. and Delarue, M. (1994), J. MoL BioL 239, 249- 
275), simulated annealing (Shenkin, P.S. et al., (1996) Proteins 26, 323-352), a genetic 
algorithm (Tuffery, P. et al., (1997) Protein Eng. 10, 361-372) or Monte Carlo simulation (Holm. 
L. and Sander, C. (1992) Proteins 14, 213-223). In general, methods whi<^ explicitly account 
for pair-wise side-chain/side-chain interactions are preferred. Such methods may follow either 
a rotameric or a non-rotameric strategy. 

2. Rotamer library. When performing step 3 on basis of the FASTER or a DEE method, 
the algorithm requires access to a library of discrete, preferential side-chain confomiations or 
rotamers. Such library may be called a rotamer library. Non-limiting examples Include Ponder 
and Richards (Ponder. J.W. and Richards, P.M. (1987) J. MoA BioL 193, 775-791). Tuffery et 
aL (Tuffery. P. et al., (1991). J. BiomoL Struct. Dynam. 8, 1267-1289). Holm and Sander, 
(1992); Schrauber era/., (Sqhrauber, H. ef .a/., (1.993). J. .Mo/. BiioL 23Q,..592-612),JDunbrack - 
and Karplus, (Dunbrack, R.L.Jr. and Karplus. M. (1993) J. MoL BioL 230, 543-574), De Maeyer 
et aL, 1997. Mendes et aL (Mendes, J. et aL (1999) Proteins 37, 530-543), Xiang and Honig, 
(Xiang, Z. and Honig. B. (2001) J, MoL BioL 311, 421-430). One way to define rotamers is to 
store them as a list of torsional angle values for all rotatable bonds within a particular side- 
chain type and for the chemical bond that connects it to the backbone. Alternatively, rotamers 
in the library may be stored as sets of atomic co-ordinates in a given reference frame. 
Whatever rotameric representation is chosen, it is prefenred that the rotamer library provide the 
necessary and sufficient information to reconstruct side-chain conformations in an 
unambiguous way onto a polypeptide backbone. One example of a preferred rotamer library is 
the one devised by Mendes et aL (1999), comprising so-called "flexible rotamers". Herein, a 
flexible rotamer is essentially defined as an ensemble of sub-rotamers deviating slightly in 
structure from a classic rigid rotamer. The latter type of rotamers is especially suited for the 
present method since it enables quantification of side-chain entropical effects, both for peptide 
and receptor side-chains, in a similar fashion as for the peptide main-chain. Also prefen-ed are 
highly detailed libraries of classic rigid rotamers. whether backbone-dependent (Dunbrack & 
Karplus, 1993; Bower et aL, 1997. Desmet et aL, 1997) or backbone-independent (De Maeyer 
et aL, 1997; Xiang & Honig, 2001). A less preferred method for assigning side-cliain 
conformations is by applying a non-rotameric approach such as a molecular mechanics or 
dynamics method, or a combination protocol (Rognan et aL, 1999). Non-rotameric methods 
are prefen-ed less because they are slower and less efficient in conformational sampling 
(Mendes et al„ 1999), though they fall within the scope of the present invention. 
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3, Scoring function for side-chain placennent A method of the present invention 
distinguishes between two separate scoring functions, the first being applied to structure 
prediction of side-chains (and also peptide main-chains, if step 2 of the present method is 
performed by way of docking), and the second scorinjg function t)eing applied in the affinity 
prediction step (see step 4. MHC/Peptide Affinity Assessment). As it is intended for usage in 
conjunction with a method for searching (sampling) huge conformational hyperspaces, the first 
scoring function is preferably intrinsically rapid to evaluate and, also, it does not have to 
include as many energetical components as an affinity scoring function. One purpose of the 
said scoring function is to allow the determination of the correct confonnation of a specific 
MHC/peptide complex. For this reason, a standard potential or free energy function might be 
applied that accounts for the intramolecular interactions. Such a function is usually called a 
force field function. Non-limiting examples of widely used force fields include the CHARMM 
force field (Brooks, B.R, et al., (1983) J. Comput Cfiem. 4, 1 87-21 7)^ the AMBER force field of 
Kollman and co-workers at UCSF (Welner, S.J. ef a/., (1984) J. Am. Chem. Soc. 106, 765-784) 
and the DREIDING field (Mayo, S.L. ef a/., (1990) J. Rhys. Chem. 94, 8897-8909). The applied 
energy function may include as many relevant energetic contributions as possible, non-limiting 
examples of which include van der Waals interactions, H-bond formation, electrostatic 
interactions and contributions related to chemical bonds (bond stretching, angle bending, 
torsions, planarity deviations). The present inventors have shown that these energy terms 
suffice to reach the currently highest possible accuracy In side-chain prediction while allowing 
very rapid modeling (Desmet ef a/.. 2002). The scope of the present invention allows for force 
fields which satisfy any of the above. In one embodiment of the present invention, the 
prefenred force field is CHARMM (Brooks etaL, 1983). 

4. MHC/peptide affinity assessment. The ligand binding affinity (Kb) is related to the 
binding free energy (AG) by the following equation. 

AG = -RTIn(Kb) [3] 
where R is the ideal gas constant (8,31 J mol"^ K'^) and T the absolute temperature in degrees 
Kelvin. Further, Kb is the inverse of the dissociation constant (Ka) which is approximately equal 
to. the often mentioned ICgo value. 

AG = RT In(Kd) « RT ln(IC5o) [4] 

The binding free energy. AG, is the difference in Gibbs free energy between the free 
receptor molecule plus the free peptide ligand on the one hand and the receptor/ligand 
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complex on the other hand. Strongly negative AG values indicate strong binding. Differences in 
AG for different peptides and/or different MHC subtypes may be due to a variety of reasons, 
including enthalpic and entropic effects related to any otthe free or bound states. Since many 
of these effects can by no means be deduced from theoretical simulations, affinity scoring 
5 functions might include more than one parametrized components. A basic approach of the 
present invention is then to incorporate into the predicted binding free energy, AGp^, as much 
relevant structural infonnation as possible, and to cover all other effects by empirical 
components. Assuming that the different contributions are independent and additive, the 
following is an example of a general expression which reflects the predicted binding free 
10 energy: 

In equation [5], Sj and P, are structure-derived and non-structure derived contributions, 
respectively. Ns and Np are the number of considered contributions of both types while S| and 
Pi are their respective weight coeffidents. It should be noted, however, that most methods 
consider either stmcture-based or non-structure based terms but seldomly both. The 
coefficients s, and the number of structural connponents Ns are in fact parameters as well since 
they need to be calibrated. The coefficients p, are in many methods set equal to unity. 

With respect to the structure-related temis in Eq. [5], one approach is to sum over all 
contributions provided by a force field function (e.g. electrostatic, van der Waals, H-bonding 
temis. etc.). However, pure standard force field terms generally do not yield an optimal 
correlation with experimental data. Including additional effects, non-limiting examples of which 
include desolvation, freezing of rotatable bonds, special hydrophobicity terms, may significantly 
enhance correlation. The "Fresno" method (Rognan et af.. 1999) considers five individual 
contributions: H-bonding, lipophilic contacts, rotatable bond freezing, burial of polar atoms and 
desolvation. This scoring function requires re-calibration of the weight coefficients for different 
MHC subtypes. The method of Schueler-Furman et al. (2000) only considers MHC side- 
chain/peptide side-chain contacts (with a special treatment of MHC side-chains in contact with 
the peptide backbone) in conjunction with a statistical painwise potential. 

Scoring functions based on experimental data often rely on the frequency of amino acid 
types observed at each position in a population of peptides (e.g. self peptides) that are known 
to bind to a specific MHC allele (Rammensee et aL, 1999). Alternatively, the contribution of 
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individual amino acid types at each position in a peptide sequence to the peptide's total 
binding affinity may be estimated by a number of statistical analyses. This can be done for a 
set of known binding peptides (Parker a/., 1994) or experimentally constructed peptides 
(Hammer et aL, 1 993; Fleckenstein et aL, 1 999). 

A method of the present invention is predominantly based on 3D structural 
contributions. Structural contributions preferably comprise: (i) all terms that can be computed, 
using a force field e.g. CHARMM (Brooks ef a/., 1983), for a MHC/P^i complex resulting from 
step 3 of a method; (ii) contributions computed in the same way for separately modeled 
reference states of the free peptide and receptor; (iil) contributions accounting for desolvation 
oiF both the receptor and the peptide upon complex formation, and (iv) importantly, entropical 
contributions derived in accordance with a statistical mechanical analysis of the ensemble of 
structures obtained in step_3, /.e. {MHC/Pui^, 

When following the standard docking approach to generate the latter ensemble, one 
generally obtains a limited set of complex stoictures that are all energetically relaxed. In one 
embodiment of a method of the present invention, the contributions (i) to (Hi) are added up for 
each structure of the ensemble and each sum Is given the weight coefficient Sj = 1/(Nsoi). where 
Nsoi is the number of solutions in the ensemble. This yields the energetical term <E> in Eq. [1]. 
The stnjcture-related component (iv), corresponding to the entropical contribution S in Eq. [1], 
may be set equal to In(Nsoi), or ke In(Nsoi) where ke is Boltzmann's constant. The latter 
constant may be included in the weight coefficient (c in Eq. [1], corresponding to Semropy in Eq. 
[5]). This coefficient is subject of global parameter optimization, which is to be executed by a 
suitable parameter optimization method. A non-limiting example illustrating the importance of 
including an entropical component is provided in EXAMPLE 4. 

V; When a method of the present invention is performed in accordance with the advanced 
database-related execution mode, a more sophisticated method may be needed to determine 
the appropriate weight coefficients of aforementioned contributions (i) to (iv), preferably on the 
basis of statistical mechanical relationships. 

Besides structure-related contributions (S| in Eq. [5]), it is within the scope of the 
preisent method to consider a number of non-structural terms (P| in Eq. [5]). A first possibility is 
a combination method formed by fusing a stmcture-based and an experimental method. This is 
accomplished by determining the globally optimal set of weight coefficients {Si.pJ. applying a 
suitable parameter optimization method. 
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A preferred possibility is to include topology contributions, for example the Type and 
Topology Specific" (TTS) contributions of Desmet ef a/. (International Patent Application No. 
WO 02/05146) which has been invented in the context of protein design. This method 
considers a limited number of topology classes (typically 2 or 3). depending on a residue's 
degree of burial in a complex. The notion topology may also be extended so as to reflect, 
besides shielding from solvent, the chemical nature of a residue's environment, for example a 
measure of polarity. Furthermore, it is within the scope of the present invention to consider an 
alternative to the residue type dimension in the concept of TTS parameters, namely 
distinguishing chemical groups instead of residue types. A preferred classification of chemical 
groups is the following: 1, CHx aliphatic; 2, CHx aromatic; 3, NH^ aromatic; 4. OH; 5. S+SH; 6. 
NH3*; 7, COO-; 8. CONH^; 9, NHC(NH2)2*. This way, the type-dimension in the set of TTS 
parameters can be restricted to 9 groups (instead of 20 residue types). The option to work with 
chemical groups is fully compatible with the broader definition of topology. This creates a 
landscape of possibilities that can be explored by applying a suitable data mining and 
parameter optimization strategy, which is within the scope of the present invention. It is further 
within tiie scope of the invention to identify and quantify the most relevant contributions in the 
attempt to enhance the comelation between predicted and experimental AG values. The 
incorporation of type and topo/ogy-specific contributions again leads to a fully stnjcture-based 
method. 

As used herein, a "peptide" refers to at least two covalently attached amino adds which 
includes polypeptides and oligopeptides. The peptide may be made up of naturally occurring 
amino acids and peptide bonds, or non-naturally-occun-ing amino acids or synthetic 
peptldomlmetic structures, i.e., "analogs" such as peptoids [see Simon, R.J. et at., (1992) Proc. 
Nati. Acad. Sd. U.S.A. 89(20), 9367-93711. generally depending on the method of synttiesis. 

The peptides of the invention can be prepared by dassical chemical synthesis. The 
synthesis can be camed out in homogeneous solution or in solid phase. For instance, the 
syntfiesis technique in homogeneous solution which can be used is tiie one described by 
Houbenweyl in the book entitled "Mettiode der organischen chemie" (Method of organic 
chemistiy) edited by E. Wunsh, vol. 15-1 et II. THIEME. Stuttgart 1974. The peptides of tiie 
invention can also be prepared in solid phase acxx)rding to the metiiods described by Atherton 
and Shepard in their book entitled "Solid phase peptide synthesis" (IRL Press. Oxford. 1989). 
The peptides according to Oils invention can be prepared by means of recombinant DNA 
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techniques as described by Sambrook et al., Molecular Cloning: A Laboratory Manual 2nd 
edition. New York, Cold Spring Harbor Laboratory, 1989). 

"Amino acid", or "residue", as used herein means both naturally occurring and synthetic 
amino acids. For example, homo-phenylalanine, citrulllne, and noreleuclne are considered 
amino acids for the purposes of the invention. "Amino add" also includes imino acid residues 
such as proline and hydroxyproline. In addition, any amino acid representing a component of 
the variant proteins of the present invention can be replaced by the same amino acid but of the 
opposite chirality. Thus, any amino acid naturally occurring in the L- configuration (which may 
also be referred to as the R or S, depending upon the structure of the chemical entity) may be 
replaced with an amino acid of the same chemical structural type, but of the opposite chirality, 
generally referred to as the D- amino acid but \A4iich can additionally be referred to as the R- or 
?:,'5*?P.®D^'"9 upon its cornposition and chemical ranfiguration. Such derivatives have the 
property of greatly increased stability, and therefore are advantageous in the formulation of 
compounds which may have longer in vivo half lives, when administered by oral, intravenous, 
intramuscular, intraperitoneal, topical, rectal, intraocular, or other routes. 

In the preferred embodiment, the amino acids are in the (S) or L-configuration. If non- 
naturally occurring side chains are used, non-amino acid substituents may be used, for 
example to prevent or retard in vivo degradations. Proteins including non-naturally occurring 
amino acids may be synthesized or in some cases, made recombinantly; see van Hest et aL, 
FEBS Lett 428:( 1-2) 68-70 May 221998 and Tang et aL, Abstr. Pap Am. Chem. S218:U138- 
U138 Part 2 August 22,1999, both of which are expressly incorporated by reference herein. 

Aromatic amino acids may be replaced with D- or L-naphylalanine, DM or L- 
Phenylglydne, D- or L-2- thieneylalanine, or L-1-, 2-, 3- or 4-pyreneylalanine, D- or L-3- 
thieneylalanine, D- or L-{2-pyridinyl)- alanine, D- or L-(3-pyridlnyl)-alanine, D- or L-(2- 
pyrazinyO-alanine. D- or L-(4-isopropyl). phenylglydne, D-{trifluoromethyl)-phenylglycine, D- 
(trifluoromethyl)-phenylalanine, D-p-fluorophenylalanine, D- or L-p-biphenylphenylalanine, D- 
or L-p-methoxybiphenylphenylalanine. D- or L-2-indole(alkyl)alanines. and or L-alkylainines 
where alkyi may be substituted or unsubstituted methyl, ethyl, propyl, hexyl, butyl, pentyl, 
isopropyl. iso-butyl, sec-isotyl, iso-pentyi, non-acidic amino acids, of C1-C20. 

Acidic amino acids can be substituted with non-carboxylate amino acids while 
maintaining a negative charge, and derivatives or analogs thereof, such as the non-limiting 
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examples of (phosphono)alanine, glycine, leucine, isoieucine, threonine, or serine; or sulfated 
(e.g., -SO3H) threonine, serine, or tyrosine. 

Other substitutions may include unnatural hyroxylated amino acids may made by 
combining "alkyi" with any natural amino acid. The term "alkyl" as used herein refers to a 
branched or unbranched saturated hydrocarbon group of 1 to 24 carbon atoms, such as 
methyl, ethyl, n-propyl, isoptopyl, n- butyl, isobutyl, t-butyl, octyl, decyl, tetradecyl, hexadecyl. 
eicosyl, tetracisyl and the like. Alkyl includes heteroalkyi, with atoms of nitrogen, oxygen and 
sulfur. Preferred alkyl groups herein contain 1 to 12 carbon atoms. Basic amino acids may be 
substituted with alkyl groups at any position of the naturally occum'ng amino acids lysine, 
arginine, ornithine, citrulline, or (guanidino>-acetic acid, or other (guanidino)alkyl-acetic adds, 
where "alkyl" is define as above. Nitrile derivatives (e.g., containing the CN-moiety In place of 
CQOH) may also be-substitutedfor^sparagine or glutamine, and methionine sulfoxide may be 
substituted for methionine. Methods of preparation of such peptide derivatives are well known 
to one skilled in the art. 

in addition, any amide linkage in any of the variant polypeptides can be replaced by a 
ketomethylene moiety. Such derivatives are expected to have the property of increased 
stability to degradation by enzymes, and therefore possess advantages for the formulation of 
compounds which may have increased in vivo half lives, as administered by oral, intravenous, 
intramuscular, intraperitoneal, topical, rectal, intraocular, or other routes. 

Additional amino acid modifications of amino acids of variant polypeptides of to the 
present invention may include the following: Cysteinyl residues may be reacted with alpha- 
haloacetates (and corresponding amine), such as 2-chloroacetic acid or chloroacetamide, to 
give carboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues may also be 
derivatized by reaction with compounds such as bromotrifluoroacetone, alpha-bromo-beta-(5- 
imidozoyl)propionic acid, chloroacetyl phosphate, N-alkylmaieimides, 3-nitro-2-pyridyl disulfide, 
methyl 2-pyridyl disulfide, P- chloromercuribenzoate, 2-chloromercuri-4-nitrophenol, or chloro-7 
-nitrobenzo-2-oxa-1 ,3-diazole- 

Histidyl residues may be derivatized by reaction with compounds such as 
diethylprocarbonate e.g., at pH 5.5 to 7.0 because this agent is relatively specific for the 
histidyl side chain, and para-bromophenacyl bromide may also be used, e.g., where the 
reaction is preferably perfonned in 0.1 M sodium cacodylate at pH 6.0. 
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Lysinyl and amino terminal residues may be reacted with compounds sucli as succinic 
or other carboxyllc acid anhydrides. Derivatization with these agents is expected to have the 
effect of reversing the charge of the iysinyi residues. 

Other suitable reagents for derivatizing alpha-amino-containing residues include 
compounds such as imidoesters e.g., as methyl picolinimidate; pyridoxal phosphate; pyridoxal; 
chloroborohydride; trinitrobenzenesulfonic acid; O-methylisourea; 2,4 pentanedione; and 
transaminase-catalyzed reaction with glyoxylate. Arginyl residues may be modified by reaction 
with one or several conventional reagents, among them phenylglyoxal, 2, 3-butanedione, 1 ,2- 
c^clohexanedione, and ninhydrin according to known method steps. Derivatization of arginine 
residues requires that the reaction be performed In alkaline conditions because of the high pKa 
of the guanidine functional group. Furthermore, these reagents may react with the groups of 
lysine as. well as.the arginine epsilon-amino group..The. specific modification of tyrosyl-residues- 
per se is well-known, such as for introducing spectral labels into tyrosyl residues by reaction 
with aromatic diazonium compounds or tetranitromethane. 

N-acetylimidizol and tetranitromethane may be used to form O-acetyl fyrosyl species 
and 3-nitro derivatives, respectively. Carboxyl side groups (aspartyl or glutamyl) may be 
selectively modified by reaction with carbodiimides (R'-N-C-N-R*) such as 1-Q^clohexyl-3-(2- 
morphollnyl- (4-ethyl) carbodiimide or 1-ethyl-3-{4-azonia-4,4- dimethylpentyl) carbodiimide. 
Furthermore aspartyl and glutamyl residues may be converted to asparaginyl and glutaminyl 
residues by reaction with ammonium ions. 

Glutaminyl and asparaginyl residues may be frequently deamidated to the 
corresponding glutamyl and aspartyl residues. Alternatively, these residues may be 
Cleamidated under mildly acidic conditions. Either form of these residues falls within the scope 
of the present invention. 

As used herein "side-chain placement algorithm" refers to methods for optimizing the 
side-chain conformations of residues. Non-limiting examples of such methods include 
Intemational Patent Application No. WO 01/33438, De Maeyer et al (De Maeyer et at., (2000) 
Methods in Molecular Biology, vol. 143: Protein Structure Prediction: Methods and Protocols. 
Webster. D. (Ed.) Humana Press Inc.. Totowa, NJ. pp. 265-304), Koehl, P. and Delarue. M. (J. 
MoL Biol, (1994) 239, 249-275), Shenkin, P.S. et al„ (Shenkin. P.S. ef a/., (1996) Proteins 26, 
323-352). Tuffery ef a/. (Tuff6ry. P. et aL, (1997) Protein Eng. 10, 361-372), Holm and Sander 
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{Proteins (1992) 14. 213-223 1992). Further included are methods which explicitly account for 
pair-wise side-chain/side-chain interactions. 

As used herein, "dead-end-elimination" or "DEE" refers to methods for testing which 
side-chain confonnations are energetically incompatible with the globally optimal side-chain 
an^angement onto a protein bacl<bone (or template) structure (e.g. Desmet. J. et aL, (1992) 
Nature 356. 539-542). In a protein system to be tested, each amino add residue is first 
represented by a limited set of discrete side-chain confonnations obtained from a library of 
theoretically possible conformations, also known as a natamer library. To anive at a globally 
optimal confonnation for the protein system, rotamers are screened in accordance to one or 
more mathematical expressions, called DEE criteria. Different valid elimination criteria have 
been identified in the past (De Maeyer, M.. Desmet, J. and Lasters, I. (2000) The dead-end 
elimination theorem: mathematical aspects, implementation, optimizations, evaluation and 
perfomiance. In: Methods in Molecular Biology, vol. 143: De Maeyer, M., Desmet, J. and 
Lasters, I. (2000) and references therein). Upon convergence, all but one rotamers have been 
eliminated for each modeled side-chain so that the final, unique assignment of rotamers 
conesponds to the global optimum. If convergence cannot be reached by merely applying DEE 
criteria, some additional end-stage routines are required (Desmet et al., 1997). 

As used herein "fast and accurate side-chain topology and energy refinement" or 
"FASTER" refers to methods of International Patent Application No. WO 01/33438 which is 
incorporated herein by reference. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 , Schematic overview of the information generated by steps 1-4 of a method of the 
present invention. 

FIGURE 2 . Flow chart of a method of the present invention. 

FIGURE 3 . Drawing of the 43 lowest energy peptides resulting from the VSV-8 docking. The 
crystallographically determined structure is presented by the sticks model. Black color Is used 
for the main-chain atoms and gray for the side-chain atoms. Only "heav/* (non-H) atoms are 
shown. The viewpoint is from the "side" of the peptide with the N-terminus at the left. In the 
complex, the peptide is buried within the MHC a^az domain, with the a2-helix in front, theajr 
helix at the back and the p-sheet at the bottom; the upper part of the peptide is solvent 
accessible. The MHC receptor itself, while present during docking, is not shown in the figure. 

FIGURE 4. Comparison between crystallographic temperature factors and theoretical structure 
variation. The average B-factors for the main-chain atoms of each residue of the peptide 
LLFGYPVYV, obtained from the PDB entry 1 DUZ (c-chain) are compared with the standard 
deviation on the main-chain RMSD, observed in the ensemble of docked structures. The 
docking experiment itself is described in EXAMPLE 2 of the present invention. 

FIGURE 5. Distribution of the number of docking solutions. All nonapeptides derived from the 
HPV E6 and E7 proteins were docked to the A*0201 receptor according to the protocol 
described in EXAMPLE 2 of the present invention. Each experiment yielded a set of receptor- 
compatible structures, ranging from 0 to 500. This diagram shows the distribution of docking 
solutions. 27 peptides were found to be incompatible with the receptor (inset). The main 
reason was the presence of either a bulky (R, Y, F) or a main-chain restricting (P) side-chain at 
position P2. 

FIGURE 6. Probability distribution of the root-mean-square deviation (RMSD) between the 
backbone atoms of any two peptide main-chain structures of the {MHC/P^c} ensemble 
described in EXAMPLE 3 of the present invention. 
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FIGURE 7. Distribution of predicted average binding energies of HPV E6 and E7 peptides to 
HLA A*0201. Results are obtained as described in EXAMPLE 4 of the present invention. The 
energies do not include an entropical component. 

5 FIGURE 8. Correlation between experimental and predicted affinities for 1 5 peptides from HPV 
E6 and E7 that are known to bind to HLA A*0201. Results are obtained as described in 
EXAMPLE 4 of the present Invention. Panel (a), scores obtained from average binding 
energies only. Panel (b), scores obtained by including the entropical component. Two peptides 
(sequences indicated) were considered as outliers and their scores were not included in the 
10 regression analysis. 
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EXAMPLES 

EXAMPLE 1. PEPTIDE DOnKINirt 

^ In the present example, we describe the flexible clocking of the octapeptide VSV-8 
(peptide p = RGYVYQGL) to murine MHC dass I H-2K^ (Fremont. D.H. et al., (1995) Pmc. 
Natl. Acad. Sd. USA 92, 2479-2483). The following experimental conditions were used. 

1. Peptide build-up: Tyr-P5 was chosen as the root residue because of its potential to 
fonn multiple contacts with the binding groove on the MHC. Elongation proceeded first towards 
the C- and then towards the N-tenninal end. In the following manner: — ^y— > ^yq~ > 

YQG- > ^YQGL > — ^VYQGL > -YVA'QGL > -GYVYQGL > RGYVYQGL. 

2. Peptide translations: the peptide was systematically displaced to each of 79 
translational offsets at relative distances of 1.0, 2.0 and 4.0 A from the initial position. 

3. Rotations: at each translational offset, discrete yet full-space rotation was performed 
over 84 rotational configurations. 

4. Conformations: for the peptide residues Tyr-P3, Val-P4, Tyr-P5 and Gln-P6 the 
rotamer library contained 47 main-chain conformations; for Gly-P2 and Gly-P7 there were 125 
rotamers and for the N- and C- terminal residues Arg-PI and Leu-P8 there were 12. 

5.. . Peptide and receptor side-chain conformations: side-chain conformations were 
retrieved from the backbone-dependent rotamer library described in Desmet et al. (1997). On 
average, ttiere were 16 side-chain rotamers per residue. In addition to the 8 peptide residues. 
28 receptor residues were assigned as flexible during the docking. 

6. Force field: all-atom CHARMM force field comprising temis for bond stretching, bond 
angle bending, a periodic function for the torsion angles, a Lennard-Jones potential for the 
non-bonded atom pairs, a 10-12 potential for hydrogen bonds and a coulombic function for 
dianged atoms. A distance-dependent dielectric constant was used (8=rq, where rg is the 
distance between two atoms i and j; Warshei, A. and Levitt. M. (1976) J. Mol. Biol. 103, 227- 
249. 

7. Water molecules: this experiment was performed in the presence of 9 
crystallographically determined buried water molecules that were considered as part of the 
protein. 
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8. Partial-peptide conformations (fragments) were accepted for further elongation while 
using a relative energy threshold of 10 kcal mor\ In this experiment, final fuIWength peptides 
were accepted using the same threshold. 

9. The docking algorithm terminated spontaneously and successfully after having 
5 elongated in a combinatorial fashion, i.e. residue by residue, all partial peptides to their full 

length. 

The docking of the VSV-^ peptide to MHC class I H-2K*' finally yielded a {MHC/pua} 
ensemble of 323 full-pepfide configurations within an energy interval of 10 kcal mol*^ (see 
TABLE 1). For this purpose, 1,117,957 partial peptide fragments had been processed during 
10 buildup. 



length peptide 


itconf- 


Uaccep —- 


-%accep ~ 


- Ej_best— 




1 — 


311,892 


920 


0.29 


-24.4 


-24.4 


2 — ^YQ- 


43.240 


2,074 


4.80 


-43.8 


-19.4 


3 — ^YQG- 


259,250 


13,081 


5.05 


-51,2 


-7.4 


4 — ^YQGL 


156,972 


289 


0.18 


-73.9 


-22.7 


5 — NA'QGL 


13.583 


1.064 


7.83 


-82.0 


-8.1 


6 -YWQGL 


50,008 


1.148 


2.30 


-109.5 


-27.5 


7 -GYVYQGL 


143,500 


11,626 


8.10 


-120.1 


-10.6 


8 RGYVYQGL 139.512 


323 


0.23 


-147.1 


-27.0 


sum or average: 


1.117,957 


30.525 


2.73 




-18.4 



TABLE 1. VSV-8 docking: Column 1: fragment length (number of residues); column 2: 
fragment sequence in one-letter code; column 3: total number of generated configurations for 
fragments of the corresponding length; column 4: number of accepted configurations; column 
5: acceptance ratio in %; column 6: binding energy of the lowest-energy fragment (kcal mol"^); 
column 6: incremental binding energy (kcal mol'''). 

Importantly, the docking algorithm rebuilds all side-chain conformations completely 
from scratch each time a partial or full peptide configuration is generated. In the present 
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example this was accomplished by a dead-end elimination (DEE) method. In total, 1,117,957 
separate DEE side-chain placement operations were performed, /.e. one for each peptide 
fragment. This approach might be described as an elegant way to decouple the side-chain 
modeling from the main-chain construction. It enormously reduces the space to be searched 
iand yet avoids any potential bias from incorrectly positioned or frozen side-chains. As a 
possible altemative to the DEE method, the present inventors refer to the recently published 
FASTER method (Desmet et aL, 2002). In general, any method for side-chain placement may 
be applicable. Prediction accuracy may actually form a lesser problem in view of the fact that 
the modeling of side-chains is repeated completely in step 3 of a method of the present 
invention. (But then only for the final full-length peptides, /.e. in the present example only 323 
full structures instead of more than one million partial structures). 

In summary, Table .1 shows that the. acceptance, ratio, of partial peptide.fragrnents was 
as low as 30,525 out of a total of 1 ,1 17,957 examined fragments or 2.73%. Higher acceptance 
ratios were observed when extending a fragment by a weakly restrained residue type, such as 
Gly at position P2. Yet, the combinatorial buildup did not lead to an explosion of fragments. 

Of the 323 final structures within an energy interval of 10 kcal mor\ 43 had a binding 
energy within 5 kcal mol"^ above the lowest (-147.1 kcal mor^) and are displayed In FIGURE 3. 
Compared with the experimental structure of the complex, the lowest-energy peptide had a 
rnain-chain RMSD of only 0.56 A. For the 43 displayed structures the average RMSD was 0.89 
± 0.27 A and for all 323 results it was 1.01 ± 0.39 A. The anchor residues Tyr-P3, Tyr-P5 and 
Leu-P8 were correctly packed into their complementary pockets (Fremont, D.H. et aL, (1992) 
Science 257, 919-927). The side-chain of Leu-P8 adopted two different conformational states. 
Other apparently bi-stable conformations were observed for Gln-P6 and Arg-PI (FIGURE 3). 
The side-chain conformation of Gln-P6 was clearly coupled to the conformation of the MHC 
residues Glu-152 and Arg-155. Interestingly, the altemative conformation for these two 
residues has also been crystallographically observed, namely in the structure of the same H- 
iK!" receptor complexed with the nonapeptide SEV-9 (Fremont et aL, 1992). This illustrates the 
importance of taking into account at least some limited flexibility for the side-chains of the 
receptor. 
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EXAMPLE 2. SYSTEMATIC DOCKING OF VIRAL PEPTIDES 

This example illustrates the performance of the docking algorithm described in 
EXAMPLE 1 in an application to large-scale docking.' The purpose of this example is to 
demonstrate that the algorithm remains useful not only for studying selected cases that are 
known to form high-affinity complexes, but also for handling a large number of diverse peptides 
derived from a common protein source. Some features of such a collection are (1) that the set 
of peptides Is not biased with respect to the presence of anchor residues and (ii) that the 
majority of peptides are most likely no/i-binders. Attention is paid to the computational 
requirements of the method, to statlstcs of the simulated structures and to potential difficulties 
In large-scale docking. This example also Illustrates ttie preferred embodiment of steps 1 and 2 
of a method of ffie present invention, i.e. MHC model preparation and flexible docking, 
respectively. In.addition. we have-perfornieda dustering analysis-on the -different observed 
peptide binding modes in order to study tiie (theoretical) variability of tiie main-chain of a 
peptide in a complex. 

The test case was constructed as follows. 

1. MHC receptor type/subtype: class I, A*0201 

2. PDB structure for model preparation: 1DUZ a-chain 

3. List of peptides to be docked: all nonameric (9-residue) peptides that can be derived 
from the human papillomavirus type 18 (HPV-18) E6 and E7 proteins, /.e. 150 and 97 peptides, 
respectively. Experimental binding affinities for tfie same set are available from the llteratuiB 
(Rudolf, M.P. ef a/.. (2001) Clin. Cancer Res. 7. 788s-795s) 

4. Docking conditions: force field and rotamer library are identical to Example 1. 
Translations were limited to 26 relative displacements over 0.5 A from the original position. No 
rotational moves were allowed. All crystallographic water molecules were removed. The 
peptide residue P1 was selected as ttie root residue, tfius elongation of fragments occun-ed 
from the N- to the C-tennlnus. The relative energy threshold for accepting partial peptide 
fragments was made dependent on the fragment length: 7. 7. 10. 13. 15. 15, 15. 13 and 10 for 
lengths 1-9. respectively. This was necessary because partial pepti'des of intermediate length 
tended to form many tight but false interactions with the receptor (class I nonapeptldes 
typically bulge out in the middle; Fremont et al., 1992). 
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The selection of the PDB structure 1DUZ to construct the MHC template model was 
decided on basis of its high crystallographic resolution (1,8 A). The whole PDB entry (chains a- 
e) were refined by 200 steps steepest descent energy minimization. Next, chains a (MHC) and 
c (peptide sequence LLFGYPVYV) were extracted. The only PDB information regarding the 
5 "^peptide that was retained upon docking were the coordinates of the backbone N, Ca and C 
atoms of residue P1, Prior to docking, each peptide was initialized by rebuilding it in an 
extended conformation with standard bond lengths and angles. The. N, Ca and C atoms at 
residue P1 of the initialized peptide were fitted onto those observed in the PDB structure. Next, 

the peptide of the PDB file was removed. The MHC receptor together with the initialized 

•ii.. . 

10 peptide formed the starting situation for docking. A number of trial dockings were then 
performed using the "seir peptide LLFGYPVYV in order to detemiine the optimal settings for 
. . the relative energy thresholds of partial peptides of different length (values .giveasupra,^.e.e:. 4. 
Docking conditions). These trial experiments also served to reduce, in a safe way, the number 
of flexibly treated receptor side-chains: of the initial 29 side-chains in contact with the peptide, 

15 only 14 were finally kept flexible for they had a significant influence on the final ensemble of 
predicted structures (a7, a63, a66, a70. a73. aSO, a84, a97, a99, a114. a116, a143, a146 and 
a159). With these settings, an ensemble of 210 structures was obtained for the A*0201/ 
LLFGYPVYV complex. All peptide conformations compared well with the known 
crystallographic structure: the backbone RMSD ranged from 0.75 to 1,81 A, with an average of 

20 1,08 ± 0.20 A. A good correlation was observed between the crystallographic temperature 
factors and the structural variation exhibited by the ensemble of docked structures (Figure 4). 
vThe B-factors, averaged over the main-chain atoms of each peptide residue, appeared to 
.follow well the standard deviation on the main-chain RMSD with the crystallographic structure, 
abbreviated as SD(RMSD). The latter was taken as a measure of the theoretical flexibility of 

25 the peptide main-chain. A somewhat larger than expected flexibility was observed for Gly-P4, 
which was due to a high degree of torsional freedom of the peptide planes flanking P4. A 
surprisingly high flexibility was also observed for Pro-P6: the Ca-Cp vector of this residue 
displayed a relatively large rotational variation over -90^ around the peptide's principal axis. 
Yet, this theoretical result appears to be fully justified on basis of the experimental B-factors. 

30 Also, the general correlation between both parameters suggests that the computed ensemble 
reflects the real dynamic behavior of the bound peptide. Given these satisfactory results, it was 
concluded that the experimental settings were correctly chosen. The latter were applied in ail 
next docking experiments. 
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The large-scale docking of all HPV E6 and E7 peptides was performed in an automated 
fashion. The jobs were distributed over a cluster of four SGI Origin 200 computers, each 
equipped with four 270 MHz R12000 processors and 4 GB of memory. The average 
computational time needed per Job was 8.7 CPU-hours, but some terminated almost 
immediately (0.01 CPU-h) or took a very long time (113.6 CPU-h). Typically, the docking of 
peptides containing large side-chains (Phe. Tyr. Arg) or Pro at position P2 tended to terminate 
before reaching their full length (FIGURE 5). Analysis showed that the P2 residue of these 
peptides could be accommodated only In "non-standard" conformations, for sterical reasons. 

Rudolf et al. (2001) published experimental affinity data for peptides derived from the 
HPV E6 and E7 sequences and binding to HLA A*0201. Fifteen out of the 247 displayed ICgo 
values ranging from 3 to 943 nM. These peptides can thus be classified as strong or moderate 
binders to HLA A*0201. All other possible E6 and E7 peptides had IC50 values higher than 
1000 nM and ran be termed weak or non-bihders. Interestingiy, nTan^onhe^binding peptides 
had amino acid residues at positions P2 and P9 (the so-called primary anchor positions) that 
were non-typical for binding to HLA A*0201. For example, the top-ranked peptide, 
FAFKDLFW (with Ala at position P2 instead of Leu, lie or Met) displayed an ICso value of only 
3 nM. The peptide FKDLFWYR (with Lys at P2 and Aig at P9) being a very non-typical 
peptide, still had an IC50 value of 500 nM. Two other binding peptides also had a non-typical 
aromatic residue at position P2. namely LYNLLIRCL and LFLNTLSFV. Especially for these 
peptides it was interesting to Investigate ttie behavior of the docking algoritiim. 

It can be seen from Figure 5 that none of tiie docking experiments failing to extend the 
peptide to its full lengtfi (26 out of 247 In total) concerned binding peptides (15 out of 247). 
Even ttie two binding peptides containing Tyr or Phe at position P2 could be successfully 
docked (the LYNLLIRCL and LFLNTLSFV docking resulted in 8 and 13 solutions, 
respectively), in contrast to many ottier peptides containing an aromatic side-chain at that 
position (Figure 5). The FKDLFWYR peptide could also be successfully docked (30 solutions) 
in spite of its bulky Arg side-chain at P9. In general. \Brge side-chains at ttie primary anchors 
P2 and P9 had tiie effect of reducing tfie number of docking solutions due to sterical restraints. 
For some peptides, all of which are weak or non-binders, this led to premature termination of 
the docking process. 

Another important observation was that tiie binding peptides had. on average, a much 
higher number of docking solutions than the non/weak binders. Binding peptides were 
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represented by about twice as much solutions as non/weak binders (on average: 91 vs. 42 
solutions, respectively). Similarly, only 3 of the 15 binders (20%) had less than 25 solutions 
whereas there were 132 of the 232 (57%) with less than 25 solutions among the non/weak 
binders. A logical conclusion is that the number of solutions obtained from the peptide docking 
Experiments provides an indication of true conformational flexibility of a peptide within the MHC 
binding groove. This is consistent vwth the fundamental entropical prindple stating that the 
higher the number of micro-states for a given macro-state (in this case the bound state) the 
higher will be the probability of that state. This example also illustrates the importance of 
working with ensembles of stmctures, rather than with a single modeled structure, to study the 
binding properties of MHC/peptide complexes. 

EXAMPLE 3. CONSTRUCTION OF A GENERIC MHC/PEPTIDE DATABASE 

An embodiment of the present invention is a method wherein the binding of one or 
more peptides is studied by applying an advanced database approach. As explained in the 
detailed description of the invention, such a database may be compiled from experimental 
(preferably X-ray) or theoretical (preferably docked) structures. A database obtained from 
known 3D structures has the advantage of being based on validated structural information but 
may suffer from the lack of such data, especially for certain MHC subtypes for which no 
complex structure has been solved. Even for well-represented subtypes, like the MHC class I 
HLA A*0201 allotype, there may be a strong bias towards particular observed peptide binding 
modes whereas many other feasible conformations are not yet represented in the Protein 
Qatabank. Consequently, in order to avoid problems related to a lack of experimental 
structures, the present inventors prefer to generate a database of M/^G/P^c structures by 
systematically docking a large number of peptides of different sequence. Evidently, this can be 
done separately for different MHC subtypes and for peptides of different length. In this 
example we illustrate the construction of an {MHC/P^^ ensemble for nonameric peptides 
ofiented within the binding groove of Hl-A A*0201 (represented by PDB code 1 DUZ, chain a). 

The docking experiments were performed in an identical way to the experiments 
described in Example 2. A set of 180 nonameric peptide sequences to be docked was 
established in a pseudo-random fashion as follows. The present inventors have selected 
combinations of typical anchor residues at positions P2 and P9, Le. Leu, lie and Met at P2 and 
Leu, lie and Val at P9. At all other positions, residue types were selected in a fully random 
fashion from the set of naturally occurring amino acids. This means that each of the 3^3=9 
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possible P2/P9 combinations was represented by 180/9=20 sequences with randomized 
residues at positions P1 and P3-P8. This procedure was followed to avoid the docking of 
peptides that cannot bind to the HLA A*0201 model because of incompatible anchor residues. 
At the same time, the randomization was assumed to generate sufficient variation in the 
peptide sequences to ensure a broad and unbiased sampling of the conformational space. 

All but one docking experiments temiinated in a successful way, /.e. only one 
simulation (of the peptide p = DIGVHKWW) terminated before the peptide was extended to Its 
full length. All other simulations yielded a number of MHCIp^ solutions ranging from 1 to 500 
(a user-set hard limit) and with an average of 22 per peptide. The total number of MHCIp^ 
structures was 3951 . 

All docking results were then pooled into one global {MHC/P^o} ensemble, the side- 
chains were stripped off and the coordinates of the main-chain atoms of each peptide structure 
were stored in a suitable format in a database. This completed the construction of a generic 
database collection of MHCIP^ structures, applicable for studying the binding of nonapeptides 
to the MHC class i HLA A*0201 subtype. 

The ensemble was aflenA/ards further analyzed with respect to the spatial distribution of 
peptide confonmations in the {MHCIP^^ ensemble. A suitable parameter to analyze this 
distribution is the peptide backbone root-mean-square deviation (RMSD) between different P^c 
structures in the ensemble. FIGURE 6 shows the probability distribution of finding two main- 
chain structures having a certain RMSD, From the integrated probability curve it is seen that 
for any selected P^ structure the expected number of other structures with an RMSD <. 0.5 A 
is only about 0.3% of the total population. This shows that there is very limited, if any, 
redundancy among the members of the ensemble. The probability of an RMSD ^ 1 A raises to 
0.062 or 6.2%. With respect to modeling side-chains on backbones, a difference in RMSD of 
up to 1 A can be expected to yield similar results. In other words, the further modeling of a 
peptide sequence onto each P„,c structure will be statistically perfonned onto 0.062x3951 or 
about 250 relatively correct structures. This situation offers the possibility of a further clustering 
of the ensemble and/or the averaging of the results from different side-chain placements. 
Furthermore, the width of the probability distribution (-3 A) suggests that a great variety of 
different binding modes, some of which may be required for specific peptides, are represented 
in the ensemble. From these results, the inventors concluded that the database approach 
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forming an embodiment of the present invention may be very useful to predict the binding 
properties of a peptide within an MHC binding groove. 

EXAMPLE 4. APPLICATION OF A SCORING FUNCTION TO PREDICT AFFINITIES 

A property of an MHC/peptide complex is the affinity of the peptide for the MHC 
molecule. In accordance with the structure-based approach of the present invention, the 
binding affinity is predominantly derived from information related to the three-dimensional 
structure of a modeled complex. For this purpose, a so-called scoring function is required 
which translates structural information into one or more contributions that are expected to 
correlate with experimental affinity. Different contributions may be combined, for example 
added up, in order to provide a qualitative or quantitative score for an MHC/peptide complex of 
interest By extension, different scores for different complexes may be computed, for example 
to rank different peptides according to their predicted affinity for a given MHC. 

This example is included to illustrate a practical implementation of an embodiment of 
the present invention. This example is further included to demonstrate that the incorporation of 
an entropical contribution derived from an ensemble of modeled complex structures, rather 
than from a single modeled or experimental structure, significantly enhances the quality of 
predicted affinities. Said incorporation of an entropical component is in agreement with both 
Eqs. [1] and [5] of the present invention. 

The results of the docking experiments described in example 2, more specifically the 
computer simulated binding of all HPV E6/E7 peptides to the HLA A*0201 receptor, have been 
further analyzed so as to eventually predict the affinity of the peptides. We recall that each of 
these docking experiments yielded an ensemble of MHC/p^ solutions, in accordance with a 
second step (MHC/peptide main-chain construction) of an embodiment of the present 
invention. These ensembles have been further processed in accordance with a third step 
(MHC/full peptide construction) and a fourth step (MHC/peptide affinity assessment) of an 
embodiment of the present invention. 

First, the side-chains of each MHCIpmo structure in each ensemble were rebuilt by 
applying the DEE method of De Maeyer et al. (2000). Side-chains of the MHC receptor that 
were flexibly treated were the same as during the docking experiments described in Example 2 
(14 in total). In order to reduce the effects from discrete rotameric placement of the side- 
chains, an additional modeling step was perfornned on each DEE-modeled structure: the full 
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Structures were further refined by 50 steps of steepest descent energy minimization to optimize 
local contacts. This resulted in the final set of ensembles {MHC/pm}, Le. one ensemble of full 
complex structures for each peptide p. These data fonned the major source of structure- 
related input information for a fourth step of an embodiment of the present invention. 

Since complex formation involves a physico-chemical reaction between a receptor and 
ligand molecule from the unbound to the bound state, the binding process is driven by a 
change in free energy or AG (see Eqs. [3] and [4]). Consequently, an energetical evaluation of 
complex structures is preferably complemented by a similar evaluation of models of the 
unbound molecules. The free MHC receptor was therefore modeled separately by performing 
DEE side-chain placement with the same 14 flexibly treated side-chains as for the full 
complexes, followed by 50 steps of steepest descent energy minimization. Structures for the 
free peptide, on the other hand, were not generated by DEE modeling but by generating 
maximally extended confonnations, also followed by 50 steps of steepest descent energy 
refinement The binding energy Ebjnd(p,/) of a solution / from the ensemble generated for a 
peptide p was calculated using equation [6]: 

Ebind(P,/) = Ecomplex(P>/) - Emhc - Ep(p) [6] 

where all energy values are the potential energies computed in accordance with the force field, 
and where Ecompiex(p,/). Emhc and Ep(p) are the potential energy of the complex, free receptor 
and free peptide, respecfively. Next, the binding energies were averaged overall solutions / for 
each peptide p so as to obtain the average binding energy <Ebind(p)> for the each ensemble 
{MHCIpf^. This quantity conresponds to the term <E> in Eq. [1] of the present invention. 

Figure 7 shows the distribution of the average binding energies for all predicted 
peptides. Peptides that were experimentally found to be good binders by Rudolf et sL (2001) 
are indicated in black whereas the non-binders are indicated with gray bars. It is cleariy seen 
that the known binders tend to score well in comparison with the non-binders. Yet, both 
populations are not clearly separated in that several non-binders score better than most of the 
binders (they can be envisaged as 'talse positives"). This suggests that the discriminative 
power of potential energy alone is not strong enough to obtain good separation. 

In view of the observation that most of the non-binding peptides had, on average, less 
MHC/pmc solutions in the docking step (see Example 2), it was investigated whether this factor 
could be converted into a significant, quantitative contribution of the scoring function. The most 
significant improvement in separation between binders and non-binders was obtained when 
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adding to the potential energy term a logarithmic term depending on the total number of 
solutions N contained within each ensemble. Thus, the optimal scoring funcUon F appeared to 
be of the form 

F(p) = <Ebi«,{p)>-cxlnN(p) F] 
wherein c is a constant. Interestingly/the theoiy of statistical mechanics states that the entropy 
of (microcanonical) ensembles is logarithmically related to the number of micro-slates that are 
energetically accessible. (More specifically, the entropy S equals ke ln(N) where Kb is 
Boltzmann's constant). Thus, it was straightfonvard to rationalize the logarithmlcal dependence 
on the number of solutions as a true reflection the intrinsic conformational flexibility a peptide 
within a complex. In other words, the number of energetically feiasible peptide conformations 
as derived from the simulations probably correlates in a statistically significant way with the 
true conformational entropy of a complex. 

From the optimization of the separation of binders and non-bindere. the best value for 
parameter c in Eq. [7] was found to be 20 kcal mor\ This value was appHed in a further 
analysis wherein the predicted scores for the 15 binding peptides were direcHy correlated with 
the known experimental affinity (Rudolf et al. (2001) only published quantitative values for the 
binding peptides). Figure 8 shows a "correlation plot between predicted scores and known 
binding free energies. In Figure 8a the entropical term is tumed off (c=0) while in Figure 8b it 
was set to its optimal value from the previous optimization procedure (c=20). Two peptides 
(FQQLFLNTL and FLNTLSFVC) showed an aberrant behavior compared to the rest and were 
considered as outliers. They were not included in the regression analysis. Interestingly, both 
peptides have a non-typical anchor residue (Gin at P2 of FQQLFLNTL and Cys at P9 of 
FLNTLSFVC ) while their scores appeared to be overestimated. This suggests that an 
additional con-ection factor may be desirable for typical anchor residues. 

An important observation within the context of the present invention was the markedly 
better correlation obtained with the scoring function including the entropical term (panel b. = 
0.71) compared to the function based exclusively on potential energy (panel a. R^ = 0.19). 
Without the entropy component only a very weak correlation could be observed. This is 
consistent with the distribution plot presented in Figure 7 showing that the energy component 
itself is practically useful only to identify peptides with a clear suboptimal energetic 
compatibility with the receptor. Only the combination of potential energy with a term reflecting 
conformational entropy enabled a good qualitative separation between binding and non- 
binding peptides. Furthermore, it enabled the establishing of. a quantitative relationship 
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between predicted and experimental affinities. Figure 8b shows the equation that can be used 
to convert any score value F into a predicted free energy of binding. 
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